Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

نویسندگان

  • Siddhartha Chatterjee
  • Leonardo R. Bachega
  • Peter Bergner
  • Kenneth A. Dockser
  • John A. Gunnels
  • Manish Gupta
  • Fred G. Gustavson
  • Christopher A. Lapkowski
  • Gary K. Liu
  • Mark P. Mendell
  • Rohini D. Nair
  • Charles D. Wait
  • T. J. Christopher Ward
  • Peng Wu
چکیده

of a high-performance SIMD floating-point unit for Blue Gene/L S. Chatterjee L. R. Bachega P. Bergner K. A. Dockser J. A. Gunnels M. Gupta F. G. Gustavson C. A. Lapkowski G. K. Liu M. Mendell R. Nair C. D. Wait T. J. C. Ward P. Wu We describe the design of a dual-issue single-instruction, multipledata-like (SIMD-like) extension of the IBM PowerPCt 440 floating-point unit (FPU) core and the compiler and algorithmic techniques to exploit it. This extended FPU is targeted at both the IBM massively parallel Blue Genet/L machine and the more pervasive embedded platforms. We discuss the hardware and software codesign that was essential in order to fully realize the performance benefits of the FPU when constrained by the memory bandwidth limitations and high penalties for misaligned data access imposed by the memory hierarchy on a Blue Gene/L node. Using both hand-optimized and compiled code for key linear algebraic kernels, we validate the architectural design choices, evaluate the success of the compiler, and quantify the effectiveness of the novel algorithm design techniques. Our measurements show that the combination of algorithm, compiler, and hardware delivers a significant fraction of peak floating-point performance for compute-bound-kernels, such as matrix multiplication, and delivers a significant fraction of peak memory bandwidth for memorybound kernels, such as DAXPY, while remaining largely insensitive to data alignment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vectorization techniques for the Blue Gene/L double FPU

This paper presents vectorization techniques tailored to meet the specifics of the two-way single-instruction multiple-data (SIMD) double-precision floating-point unit (FPU), which is a core element of the node application-specific integrated circuit (ASIC) chips of the IBM 360-teraflops Blue Genet/L supercomputer. This paper focuses on the general-purpose basic-block vectorization and optimiza...

متن کامل

Automatically Tuned FFTs for BlueGene/L's Double FPU

IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueG...

متن کامل

Design of the IBM Blue Gene/Q Compute chip

The heart of a Blue GeneA/Q system is the Blue Gene/Q Compute (BQC) chip, which combines processors, memory, and communication functions on a single chip. The Blue Gene/Q Compute chip has 16 þ 1 þ 1 processor cores, each with a quad single-instruction, multiple-data (SIMD) floating-point unit, and a multi-versioned Level 2 cache that provides hardware support for transactional memory, speculati...

متن کامل

EUDOC on the IBM Blue Gene/L system: Accelerating the transfer of drug discoveries from laboratory to patient

IBM Blue Gene/L system: Accelerating the transfer of drug discoveries from laboratory to patient Y.-P. Pang T. J. Mullins B. A. Swartz J. S. McAllister B. E. Smith C. J. Archer R. G. Musselman A. E. Peters B. P. Wallenfelt K. W. Pinnow EUDOCe is a molecular docking program that has successfully helped to identify new drug leads. This virtual screening (VS) tool identifies drug candidates by com...

متن کامل

Vectorization Techniques for BlueGene/L’s Double FPU

This paper presents vectorization techniques tailored to meet the specifics of the twoway single-instruction multiple-data (SIMD) double-precision floating-point unit, which is a core element of the node ASICs of IBM's 360 Tflop/s supercomputer BlueGene/L. The paper focuses on the general-purpose basic-block vectorization methods provided by the Vienna MAP vectorizer. In addition, the paper int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IBM Journal of Research and Development

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2005